Multiple sequence alignment (MSA) is a ubiquitous problem in computationalbiology. Although it is NP-hard to find an optimal solution for an arbitrarynumber of sequences, due to the importance of this problem researchers aretrying to push the limits of exact algorithms further. Since MSA can be cast asa classical path finding problem, it is attracting a growing number of AIresearchers interested in heuristic search algorithms as a challenge withactual practical relevance. In this paper, we first review two previous,complementary lines of research. Based on Hirschbergs algorithm, DynamicProgramming needs O(kN^(k-1)) space to store both the search frontier and thenodes needed to reconstruct the solution path, for k sequences of length N.Best first search, on the other hand, has the advantage of bounding the searchspace that has to be explored using a heuristic. However, it is necessary tomaintain all explored nodes up to the final solution in order to prevent thesearch from re-expanding them at higher cost. Earlier approaches to reduce theClosed list are either incompatible with pruning methods for the Open list, ormust retain at least the boundary of the Closed list. In this article, wepresent an algorithm that attempts at combining the respective advantages; likeA* it uses a heuristic for pruning the search space, but reduces both themaximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. Theunderlying idea is to conduct a series of searches with successively increasingupper bounds, but using the DP ordering as the key for the Open priority queue.With a suitable choice of thresholds, in practice, a running time below fourtimes that of A* can be expected. In our experiments we show that our algorithmoutperforms one of the currently most successful algorithms for optimalmultiple sequence alignments, Partial Expansion A*, both in time and memory.Moreover, we apply a refined heuristic based on optimal alignments not only ofpairs of sequences, but of larger subsets. This idea is not new; however, tomake it practically relevant we show that it is equally important to bound theheuristic computation appropriately, or the overhead can obliterate anypossible gain. Furthermore, we discuss a number of improvements in time andspace efficiency with regard to practical implementations. Our algorithm, usedin conjunction with higher-dimensional heuristics, is able to calculate for thefirst time the optimal alignment for almost all of the problems in Reference 1of the benchmark database BAliBASE.
展开▼